U S C 154(1)) by 971 Days " Restoring Punctuation and Capitalization in Transcribed Speech "

نویسندگان

  • Agustin Gravano
  • Martin Jansche
  • Michiel Bacchiani
چکیده

(54) GENERATING PROSODIC CONTOURS FOR 6,871,178 B2 3/2005 Case et al. SYNTHESIZED SPEECH 6,975,987 B1 12/2005 Tenpaku et a1. 6,990,449 B2 1/2006 Case . 6,990,450 B2 l/2006 Case et al. (75) Inventors: Martin Jansclhe, New York, NY (US); 7,035,791 B2 400% Chazan et a1‘ Mlchael DRlley, New York, NY (Us); 7,062,439 B2 6/2006 Brittan et al. Andrew M. Rosenberg, Brooklyn, NY 7,076,426 B1 7/2006 Beutnagel et al. (Us); Terry Tai’ Jersey City’ N] (US) 7,191,132 B2 3/2007 Brittan et al. 7,200,558 B2 * 4/2007 Kato et al. .................. .. 704/244

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, includi...

متن کامل

Recovering Capitalization and Punctuation Marks on Speech Transcriptions

This work addresses two metadata annotation tasks, involved in the production of rich transcripts: automatic capitalization, and punctuation marks recovery. The main focus concerns broadcast news, using both manual and automatic speech transcripts. Different capitalization models were analysed and compared, and results support the ideia that generative approaches capture the structure of writte...

متن کامل

Automatic Recovery of Punctuation Marks and Capitalization Information for Iberian Languages

This paper shows experimental results concerning automatic enrichment of the speech recognition output with punctuation marks and capitalization information. The two tasks are treated as two classification problems, using a maximum entropy modeling approach. The approach is language independent as reinforced by experiments performed on Portuguese and Spanish Broadcast News corpora. The discrimi...

متن کامل

Punctuation Prediction with Transition-based Parsing

Punctuations are not available in automatic speech recognition outputs, which could create barriers to many subsequent text processing tasks. This paper proposes a novel method to predict punctuation symbols for the stream of words in transcribed speech texts. Our method jointly performs parsing and punctuation prediction by integrating a rich set of syntactic features when processing words fro...

متن کامل

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013